26 research outputs found
A Theoretical Analysis of Contrastive Unsupervised Representation Learning
Recent empirical works have successfully used unlabeled data to learn feature
representations that are broadly useful in downstream classification tasks.
Several of these methods are reminiscent of the well-known word2vec embedding
algorithm: leveraging availability of pairs of semantically "similar" data
points and "negative samples," the learner forces the inner product of
representations of similar pairs with each other to be higher on average than
with negative samples. The current paper uses the term contrastive learning for
such algorithms and presents a theoretical framework for analyzing them by
introducing latent classes and hypothesizing that semantically similar points
are sampled from the same latent class. This framework allows us to show
provable guarantees on the performance of the learned representations on the
average classification task that is comprised of a subset of the same set of
latent classes. Our generalization bound also shows that learned
representations can reduce (labeled) sample complexity on downstream tasks. We
conduct controlled experiments in both the text and image domains to support
the theory.Comment: 19 pages, 5 figure
A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
One popular trend in meta-learning is to learn from many training tasks a
common initialization for a gradient-based method that can be used to solve a
new task with few samples. The theory of meta-learning is still in its early
stages, with several recent learning-theoretic analyses of methods such as
Reptile [Nichol et al., 2018] being for convex models. This work shows that
convex-case analysis might be insufficient to understand the success of
meta-learning, and that even for non-convex models it is important to look
inside the optimization black-box, specifically at properties of the
optimization trajectory. We construct a simple meta-learning instance that
captures the problem of one-dimensional subspace learning. For the convex
formulation of linear regression on this instance, we show that the new task
sample complexity of any initialization-based meta-learning algorithm is
, where is the input dimension. In contrast, for the non-convex
formulation of a two layer linear network on the same instance, we show that
both Reptile and multi-task representation learning can have new task sample
complexity of , demonstrating a separation from convex
meta-learning. Crucially, analyses of the training dynamics of these methods
reveal that they can meta-learn the correct subspace onto which the data should
be projected.Comment: 34 page
NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks
Most existing neural architecture search (NAS) benchmarks and algorithms
prioritize well-studied tasks, e.g. image classification on CIFAR or ImageNet.
This makes the performance of NAS approaches in more diverse areas poorly
understood. In this paper, we present NAS-Bench-360, a benchmark suite to
evaluate methods on domains beyond those traditionally studied in architecture
search, and use it to address the following question: do state-of-the-art NAS
methods perform well on diverse tasks? To construct the benchmark, we curate
ten tasks spanning a diverse array of application domains, dataset sizes,
problem dimensionalities, and learning objectives. Each task is carefully
chosen to interoperate with modern CNN-based search methods while possibly
being far-afield from its original development domain. To speed up and reduce
the cost of NAS research, for two of the tasks we release the precomputed
performance of 15,625 architectures comprising a standard CNN search space.
Experimentally, we show the need for more robust NAS evaluation of the kind
NAS-Bench-360 enables by showing that several modern NAS procedures perform
inconsistently across the ten tasks, with many catastrophically poor results.
We also demonstrate how NAS-Bench-360 and its associated precomputed results
will enable future scientific discoveries by testing whether several recent
hypotheses promoted in the NAS literature hold on diverse tasks. NAS-Bench-360
is hosted at https://nb360.ml.cmu.edu.Comment: NeurIPS 2022 Datasets and Benchmarks Trac
A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors
Motivations like domain adaptation, transfer learning, and feature learning
have fueled interest in inducing embeddings for rare or unseen words, n-grams,
synsets, and other textual features. This paper introduces a la carte
embedding, a simple and general alternative to the usual word2vec-based
approaches for building such representations that is based upon recent
theoretical results for GloVe-like embeddings. Our method relies mainly on a
linear transformation that is efficiently learnable using pretrained word
vectors and linear regression. This transform is applicable on the fly in the
future when a new text feature or rare word is encountered, even if only a
single usage example is available. We introduce a new dataset showing how the a
la carte method requires fewer examples of words in context to learn
high-quality embeddings and we obtain state-of-the-art results on a nonce task
and some unsupervised document classification tasks.Comment: 11 pages, 2 figures, To appear in ACL 201